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Accurate representation of the molecular electrostatic potential, which is often expanded in dis¬ 
tributed multipole moments, is crucial for an efficient evaluation of intermolecular interactions. Here 
we introduce a machine learning model for multipole coefficients of atom types H, C, O, N, S, F, and 
Cl in any molecular conformation. The model is trained on quantum chemical results for atoms in 
varying chemical environments drawn from thousands of organic molecules. Multipoles in systems 
with neutral, cationic, and anionic molecular charge states are treated with individual models. The 
models’ predictive accuracy and applicability are illustrated by evaluating intermolecular interaction 
energies of nearly 1,000 dimers and the cohesive energy of the benzene crystal. 


I. INTRODUCTION 

Efficient evaluation of intermolecular (also termed as 
van der Waals [H-Q) interactions is an essential part of 
all classical molecular dynamics simulations. Electro¬ 
static, induction, dispersion, and exchange-repulsion are 
the most frequently encountered non-bonded contribu¬ 
tions to the energy of interaction between molecules. In 
order to boost computational efficiency, these contribu¬ 
tions are often projected on pairwise-additive functions, 
the sum of which then approximates the potential en¬ 
ergy surface of a molecular assembly. Many-body effects 
(e.g., induction and dispersion) are accounted for effec¬ 
tively, by an appropriate parametrization of the potential 
energy surface. These parametrizatons are, by construc¬ 
tion, state-point dependent and rely on either measured 
or first-principles evaluated thermodynamic properties of 
a molecular assembly at a certain state point. For exam¬ 
ple, partial charges and Lennard-Jones parameters are 
often adjusted to fit the density, heat of vaporization, 
and other thermodynamic properties Q. 

Force field transferability and accuracy can of course 
be improved by retaining many-body contributions. The 
decisive advantage of this approach, which justifies extra 
computational effort, is that these terms can be evaluated 
perturbatively, i.e., by first calculating electronic prop¬ 
erties of non-iteracting molecules using first-principles 
methods and then accounting for electrostatic (first or¬ 
der), induction (second order), and dispersion (higher 
orders) contributions in a perturbative way Such 
parametrizations do not require experimental input, are 
state-point independent and, as such, can be used to pre¬ 
screen chemical compounds in silico. 

In this approach, however, even the molecular electro¬ 
static potential must be evaluated for every single molec- 
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ular conformation, requiring electronic structure calcula¬ 
tions at practically every molecular dynamics step. It has 
also been pointed out that the multipole-moment (MTP) 
description of the electrostatics must include not only 
atomic charges but also higher moments (e.g., dipoles 
and quad rupoles) [H, improving free-energy calcula¬ 
tions , spectroscopic signatures [H, [l^; and dy¬ 

namics w 

Avoiding the need for frequent quantum-chemical cal¬ 
culations has motivated the development of fast predic- 
tion methods, such as machine learning (ML) [isl - fl^ . 
With ML we refer to statistical algorithms that extract 
correlations by training on input/output data, and that 
improve in predictive power as more training data is 
added [3. While ML models for the fittiM of poten¬ 
tial energies have been in use for decades [1^, the pos¬ 
sibility to infer point charges, MTPs and polarizabilities 
has been investigated only recently (2(]| - l^ . These ap¬ 
proaches interpolate between a large number of confor¬ 
mations to accurately describe the effects of changes in 
the geometry. The accuracy that is reached comes at a 
price: The specificity of the learning procedure limits its 
applicability to the given molecule of interest. Instead 
of training electrostatic models for every new molecule, 
here we construct a transferable MTP model which can 
be applied not only to different molecular conformers but 
also atom types. 

The paper is organized as follows. We first de¬ 
scribe how to build a machine learning MTP model 
that predicts static, atomic point charges, dipoles, and 
quadrupoles for H, C, O, N, S, F, and Cl atom types 
in specific chemical environments. Next, the resulting 
electrostatic interactions are combined with a classical 
many-body dispersion (MBD) in order to validate 
the model by estimating intermolecular energies of nearly 
1,000 molecular dimers as well as the cohesive binding 
energy of the benzene crystal. We find that the machine¬ 
learning model retains an accuracy similar to the same 
model parametrized from individual quantum-chemical 
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calculations. 


II. METHODS 

The following describes the ML model, the base¬ 
line property used in the delta-learning procedure, the 
dataset, and the description of the reference MTPs. 


A. Machine Learning model 

We rely on supervised learning to construct a kernel- 
ridge regression which generalizes the linear ridge re¬ 
gression model (i.e., linear regression with regularizer A) 
by mapping the input space x into a higher-dimensional 
“feature space,” thereby casting the problem in a 

linear way [1^, [2^ . The strength of the method comes 
from avoiding the actual determination of (j) thanks to 
the so-called kernel trick (2^: Since the ML algorithm 
only requires the inner product between data vectors in 
feature space, one can apply a kernel function k(x, x') to 
compute dot products within input space, thereby leav¬ 
ing the feature space entirely implicit. As a result, the 
problem is reformulated from a u-dimensional input space 
(i.e., the dimensionality of each data vector) into an n- 
dimensional space spanned by the number of samples in 
the training set. This characteristics implies that the 
larger n, the better the prediction ought to be—thus the 
denomination of a supervised learning method. 

Here, we build on the A-ML approach (2^, which es¬ 
timates the difference between the desired property and 
an inexpensive baseline model that accounts for the most 
relevant physics. More specifically, a refined target prop¬ 
erty p{x) is predicted from baseline property p^°^ (see 
Sec. Ill Cl) plus the ML-model A 

p(x)=p^”(a;) + A(a:,pV°--), (1) 

where x corresponds to the representation vector— 
or descriptor—of the input sample (e.g., query molecule). 
A corresponds to the standard kernel-ridge regression 
model of the difference between baseline and target prop¬ 
erty constructed for n training samples, 

n 

A(a;,p^”) = ^ai [k{x,Xi) + fc'(p'^°‘',p7°'')] > (2) 

i=l 

where is the weight given to training molecule i. 
These weights are determined by best reproducing the 
reference property p^®^(x) for each sample in the train¬ 
ing set according to the closed-form solution a = 
(K -f K' -f Al)”^ - p^”), where p^®^ - p^” is the 

vector of training properties, i.e. difference between ref¬ 
erence and baseline, and K and K' are the two kernel 
matrices. Note that in Eq. [21 we have included represen¬ 
tation and baseline property in the kernel, each having a 
different width in their respective kernel functions. 


ML maps an input representation vector x into a scalar 
value of similarity. Thus, before applying ML to predict 
atomic MTPs, the information contained in the three- 
dimensional structure of a molecule must be encoded in 
a vector of numbers i.e., its representation or descrip¬ 
tor. Ideally, this information should reflect symmetries 
of molecular structures with respect to rotations, trans¬ 
lations, reflections, atom index permutations, etc. Here, 
we rely on the Coulomb matrix descriptor [29j, 
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where i and j index atoms in the molecule, Zi is atom 
f’s atomic number, and Ri represents its Cartesian co¬ 
ordinates. Note that the Coulomb matrix not only en¬ 
codes inverse pairwise distances between atoms but also 
the chemical elements involved. As different molecules 
have different numbers of atoms, their Coulomb matri¬ 
ces will vary in size. Distant neighbors are expected to 
have a comparatively small impact on a prediction, such 
that the inclusion of all neighbors can prove inefficient for 
large molecules. Given a set of molecules, we pad matri¬ 
ces with zeros such that their size amounts to n x n, 
where n is the number of closest neighboring atoms con¬ 
sidered [ 2 ^. In the following, we set n = 4. Given a 
molecule’s d atoms, there are d individual atomic MTP 
samples for the ML to learn from. For each, an indi¬ 
vidual Coulomb matrix is built in which the atom of 
interest fills up the first row/column, while the indices 
of the surrounding n atoms are sorted according to the 
atoms’ Euclidean distances to the query atom. As such, 
we coarsen our descriptor to contain at least the first 
shell of n covalently bound neighbors, and atoms that 
only differ in their environment at larger distances will 
be assigned the same MTP. We have found n = 4 to 
correspond to a reasonable compromise between compu¬ 
tational efficiency and performance. Note, however, that 
while such choices of descriptor typically do affect the 
model’s performance for given training sets, other de¬ 
scriptor choices could work just as well—as long as they 
meet the requirements and invariances necessary for the 
ML of quantum properties . 


In the context of applying ML to the prediction of ten- 
sorial quantities, such as MTPs, properties p^°'^{x) and 
p{x) will be expressed as vectors of size m —the number 
of independent coefficients of the tensor of interest (e.g., 
1 for a scalar charge, 3 for a vector dipole moment, 5 for 
a traceless second-rank tensor quadrupole). We express 
MTP moments with their minimal number of indepen¬ 
dent coefficients by using the spherical-coordinate repre¬ 
sentation. We recognize that the kernel matrices, K and 
K', will remain unmodified when learning/predicting dif¬ 
ferent tensor components of the same input data vector. 
Finally, the weights a are expressed as a matrix of size 
mxn, which naturally reduces to a vector when predict¬ 
ing a scalar quantity. 
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For this work, we have used the Laplacian kernels, 

k{xi,Xj) = exp ^ (4) 

/ |pVor_ Vor|\ 

k'ipY°\pJn = exp ' j , (5) 

where a and ( are free parameters, | • | corresponds to the 
Manhattan, or city block, Li norm. This combination 
of kernel functions and distance measure has previously 
been shown to yield the best performance for the mod¬ 
eling of molecular atomization energies and other elec- 
tronic properties using the Coulomb-matrix representa¬ 
tion [^, 1^. Nt is the number of occurrences of the 
chemical element type to which atom i belongs. As a re¬ 
sult Nt normalizes the width to be consistent with train¬ 
ing set size of a given chemical element. We report below 
(Table II) the strong variance of occurrence numbers of 
chemical elements in the employed training set. Hyper¬ 
parameter optimization on 85% of the elements encom¬ 
passing the training set (see below) yielded cr = 0.005, 
C = 0.002, and A = 10“®. We have subsequently used 
these parameters for element-specific models throughout. 
Combining Eqs. 21 and [5l the A-learning ML model 
predicts the deviation from the Voronoi baseline for a 
new query atom x of element t with Voronoi property 
according to. 


_ / Ix-x-l 

A(a;,p^”) = fe~ -be j. (6) 

i=l ^ 

For the modeling of MTP in positively and negatively 
charged molecules (±I e), we have trained respectively 
different ML models for the same set of molecules, sys¬ 
tematically assigning the corresponding global molecular 
charge and assuming a doublet state. 


B. Multipole moments 


Molecular electron densities were computed using den¬ 
sity functional theory calculations at the M06-2X level of 
theory [sj and cc-pVDZ basis set [s^- All ab-initio cal¬ 
culations were performed using the GaussianOQ program 


The Generalized Distributed Multipole Analysis 
(CDMA) 0 allowed us to partition the density into 
atomic MTPs up to quadrupoles, where we relied on grid- 
based quadrature (i.e., switch value of 4). The same pro¬ 
tocol was applied to train the ML models for positively 
and negatively charged molecules, after reassigning the 
global charge of each molecule. 

The reference MTPs, obtained from the distributed 
multipole analysis were rotated into a molecular reference 
frame, which was constructed from the (sorted) eigenvec¬ 
tors of the molecule’s moment of inertia tensor centered 
at the atom in question. To ensure uniqueness, we set 


the positive axis of each vector such that its scalar prod¬ 
uct with the vector pointing from the atom of interest to 
the molecule’s center of mass is positive. For linear (e.g., 
diatomic) molecules, we assign the interatomic direction 
as the first axis and arbitrarily construct two orthogo¬ 
nal axes. After the ML prediction, we rotated back the 
MTPs in the original, global frame. 

All MTP interactions were computed in CHARMM 
using the MTPL module [ll|,[13 ^ while our in-house 
code used for the many-body dispersion energies [23 is 
freely available online [3^ . 


C. Voronoi partitioning of the charge density 

The Voronoi baseline model relies on a systematic, 
geometry-dependent estimation of a system’s underlying 
charge density. Reference atomic MTP coefficients are 
extracted from the partitioning of said density (see be¬ 
low for details), where monopole, dipole, and quadrupole 
contributions are given by 



^dr ni(r). 

(7) 

^ . 

^dr ni(r) Tq,, 

(8) 

^5 

II 

^dr ni{r) rp, 

(9) 


respectively, where ni(r) denotes the partitioned density 
attributed to atom i, as a function of spatial coordinate 
r, and a^/3 G {x,y,z}. Rather than being derived from 
quantum-chemical calculations, ni{r) is constructed as a 
Gaussian-based atomic density 

where is the position of nucleus i, and is the chemi¬ 
cal element’s free-atom radius which is fixed independent 
of molecular environment or geometry. For this, we have 
used parameters reported elsewhere [H, [l^]. Atomic 
densities |n,,(r)} are partitioned according to a Voronoi 
scheme [^, whereby only the closest atom contributes 
to a given spatial coordinate. The Euclidean distance 
provides the distance metric to identify a region TZp as¬ 
sociated to atom p 

TZp = {r G I d(r, Tp) < d(r, r^) for all j ^p}- (II) 

Fig. [1] illustrates the Voronoi-based density partitioning 
between the atoms of water. Each color corresponds 
to the atomic density of the corresponding atom. We 
recently introduced a similar protocol to effectively es¬ 
timate atomic polarizabilities which serve as input for 
many-body dispersion interactions [^ . 

Note that the Voronoi model contains no free 
parameter—the free-atom radii being applied without 
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FIG. 1. Cartoon of Voronoi-based density partitioning for 
a water molecule (rendered with VMD [4^). Dashed lines 
delineate the partition boundaries. The axis system illustrates 
the orientation of the water molecule’s global frame for the 
calculations presented in Sec. IIII Al aligned in the xy plane 
with the oxygen atom pointing toward the y axis. 


prior fitting. Though the model hardly reproduces any of 
the reference MTP coefficients, it provides a qualitative 
evaluation of the coefficients. In particular, the baseline 
model reproduces elementary symmetries of the system 
that are entirely determined by the geometry, e.g., a pla¬ 
nar molecule cannot generate an orthogonal dipole mo¬ 
ment. 

While we compute Eqns.[71IHl andlHlin Cartesian coor¬ 
dinates, we subsequently convert them to their spherical 
counterparts where k denotes the rank (e.g., k = 0 

for monopoles) and m indexes the (real) component of 
the MTP (see Stone @ for more details). Given a molec¬ 
ular structure, we estimate for each atom the baseline 
property = {Qoo, Qim, Q 2 m}, where m runs over 
the 2k + 1 elements of an MTP of order k. 


D. Molecular dataset 

To refine atomic properties beyond the baseline pre¬ 
diction, we train the transferable ML algorithm on 
2, 896 neutral molecules obtained from the Ligand.Info 
database [i^, totaling 82.1 kilo atoms. Atoms have been 
segregated between training and prediction pools ran¬ 
domly. The database provides three-dimensional coordi¬ 
nates of small, organic molecules. We focus exclusively 
on a subset of neutral molecules that include elements H, 
C, N, O, S, F, and Cl. 

III. RESULTS 

A. Voronoi-based baseline evaluation 

To illustrate the applicability of the Voronoi-based 
baseline evaluation of MTP coefficients, we compare its 


prediction with the reference MTP coefficients obtained 
from ab initio methods. Given that MTPs are inherently 
axis-system dependent (apart from the monopole), we 
first describe the global frame used for the water molecule 
in Fig. [TJ Inherent symmetries of the geometry impose 
some coefficients to be zero, e.g., there can be no dipole 
moment along the x or z directions due to the molecule’s 
C 2 v symmetry. While the Voronoi-baseline does not even 
qualitatively reproduce the non-zero coefficients—due to 
the method not entailing any prior parametrization—its 
ability to impose the right symmetry is very desirable. 
The same kind of behavior is also shown for a carbon 
atom on benzene, or the carbonyl oxygen of formic acid 
in Tab. H For comparison. Tab. U also shows already the 
corresponding ML augmented MTP result. As such, the 
baseline recovers an important aspect of the underlying 
symmetry, which the augmenting ML model no longer 
needs to account for. 



Qoo 

Qio Qiic Qiis 

Q20 Q21C Q21S Q22C Q22S 

Vor 

A-ML 

Ref 

0.04 

-0.13 

-0.39 

water c 

0 0 -0.12 
-0.01 -0.02 0 

0 0.01 -0.40 

xygen 

0.43 0 0 -0.21 0.01 

0.16 -0.03 0.04 -0.18 0 

-0.92 0 0 0.45 0 

Vor 

A-ML 

Ref 

0.01 

-0.35 

-0.45 

formic acid cai 
0 0 0 
-0.30 -0.03 0.03 
-0.10 -0.15 0 

bonyl oxygen 

0.08 0 0 0 0 

0.55 0.10 -0.04 -0.10 0.15 

0.38 0.13 0 -0.32 0 

Vor 

A-ML 

Ref 

0.01 

-0.10 

-0.03 

benzene 
-0.01 0.01 0 
-0.04 -0.10 -0.09 
0 0 0 

carbon 

-0.01 0 0 0.07 0.01 

-0.73 -0.24 0.19 0.17 -0.05 
-0.65 0 0 -0.14 0 


TABLE I. MTP coefficients of oxygen in water (see Fig. [T]) 
and in the carbonyl bond of formic acid, as well as the MTPs 
of carbon in benzene. “Vor,” “A-ML,” and “ref” correspond 
to the Voronoi-based property evaluation, the delta-learned 
prediction across chemical compound space (see the A-ML- 
85 model below), and the ab initio data, respectively. All 
coefficients are expressed in units eA*, where I is the MTP’s 
rank. The comparatively low accuracy of the A-ML model 
for water is rationalized in the main text. 


B. A-ML MTP model trained and tested across 
chemical space 

In principle, the above-mentioned Coulomb matrix en¬ 
codes enough chemistry to train all chemical elements. 
Memory limitations of the kernel-ridge regression, how¬ 
ever, make atom-type specific models better tractable. 
We now investigate the A-ML model’s capabilities to 
predict MTP coefficients across chemical space, one for 
each of those chemical elements that are most frequent 
in small, organic molecules (see above). The ML model 
has been trained on various fractions of the considered 
dataset’s 82 kilo atoms. 
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(a) 



Training fraction 



FIG. 2. (a) Mean absolute prediction error (MAE) as a function of training set percentage of an 82k-atom database obtained 
from neutral molecules. MAEs are given in eA* units with I being the MTP rank. Errors correspond to MTP A-ML model 
predictions for individual components of atomic monopole, dipole, and quadrupole moments shown in left, mid, and right-hand 
side panel, respectively, for all elements present. Scatter correlation plots for all components of (b) monopoles, (c) dipoles, 
and (d) quadrupoles, as predicted with the A-ML-85 model. Colors correspond to the atom legend in (a). The outliers in the 
monopole correlation curve correspond to sulfur-oxide groups (see main text). 


Fig. [5] (a) displays error saturation curves for indi¬ 
vidual chemical elements. These monotonically decay¬ 
ing learning curves are presented as a function of train¬ 
ing size of the dataset, where the predicted mean ab¬ 
solute error (MAE) is calculated across the remaining 
atoms not included in the training set. The finding of 
monotonically decreasing error with training set size rep¬ 
resents numerical evidence for a crucial feature of the 
supervised-learning working hypothesis: The accuracy 
of the A-ML model of MTPs improves as more data 
is being added. Monopole coefficients have the fastest 
learning rate, quickly reaching MAE between 0.1 e and 
0.01 e for all elements. Differences between elements are 
simply due to their relative frequency in the data base. 
More specifically, since hydrogens and carbons predomi¬ 
nate in small organic molecules they provide much larger 
training sets, and consequently more accurate ML models 
when measured in terms of percentage of training data 
being used. The scatter correlation plot between pre¬ 
dicted and reference monopoles for each element is given 
in Fig. [2] (b) for the largest training fraction used: 85% 
(denoted “A-ML-85”), the exception being the hydrogen 
A-ML model trained on only 75% of the dataset given its 
converged accuracy. The monopoles modeled by A-ML- 


85 reach high Pearson correlation coefficients: « 97%, 

except F and Cl, for which = 17% and 68%, respec¬ 
tively. Such poor performance is explained by the small 
size training data set available for these elements. The 
outliers in the monopole scatter plot (Fig. [5] (b)) corre¬ 
spond to sulfur-oxide groups. Also here, the few samples 
of these groups in the training set results in significant 
prediction errors. 

Predicted dipoles show a MAE across elements be¬ 
tween 0.02 eA and 0.15 eA, depending on training set 
size. The heterogeneity of the chemical environments of 
the elements is reflected in the ML-model’s performance. 
The A-ML dipole moments of hydrogens are extremely 
accurate—most likely due to hydrogens showing weak 
overall MTP moments and due to their repeating saturat¬ 
ing bonding pattern. By contrast, carbon atoms, albeit 
being nearly as frequent as hydrogens in the database, 
have MTP ML models with significantly larger MAE. We 
rationalize the A-ML’s relative difficulty to predict this 
element by the large chemical variety carbon exhibits, 
i.e., strongly varying hybridization states and possible 
bonding with all other elements. Also note the reversal 
of the relative offset of the F and Cl learning curves as one 
proceeds from monopole to dipole moments, despite the 
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fact that there are roughly twice as many Cl as F atoms 
in the data base. This effect is possibly due to chlo¬ 
rine’s larger polarizability, which implies that the chemi¬ 
cal environment of the atom plays a more important role 
for the dipole-moment, turning the ML-based modeling 
into a higher-dimensional and thereby more challenging 
statistical-learning problem. Such effects, however, can 
only fully be explained through an in-depth study with 
significantly larger data sets. The scatter plot for pre¬ 
dicted versus reference dipole moments is shown for all 
elements in Fig. [5] (c) for A-ML-85. Clearly, the corre¬ 
lation is worse in comparison to monopoles « 50%), 
which is in line with what one would expect for a more 
complex vectorial quantity. 

Of all MTPs considered, quadrupoles represent the 
most complex and challenging property. Not surpris¬ 
ingly, the resulting ML models yield the largest MAE 
for our training set: between 0.02 eA^ and 0.30 eA^ de¬ 
pending on training set size. The spread of MAEs across 
elements is strikingly more pronounced. Nevertheless, we 
find a larger correlation coefficient compared to dipoles: 
B? K, 65%, see Eig. [2] (c) and (d). Note that the Cl/F 
accuracy reversal with respect to the monopole model is 
also manifested for the A-ML MTP model of this rank. 


C. A-ML MTP vs ML MTP model 

We have compared the relative improvement gained 
when combining the ML with the baseline evaluation 
from the Voronoi scheme. Tab. HIl compares MAEs de¬ 
composed by chemical element for the prediction with 
85% training fraction both with (i.e., “A-ML-85”) and 
without (i.e., “ML-85”) prior Voronoi baseline evalua¬ 
tion. The table also specifies the number of atoms in¬ 
volved in the prediction pool (i.e., outside the training 
fraction). While the Voronoi scheme does nothing to im¬ 
prove monopoles (for F and Cl it even worsens the pre¬ 
diction) it is increasingly helpful as we move to dipoles 
(with negligible change for F and Cl), and quadrupoles 
(with small improvement for F and Cl). We stress that 
the observed trends for F and Cl should be interpreted 
with utmost caution since their frequency in the database 
is very small (363 and 739). The lack of improvement 
for monopoles stems directly from the Voronoi scheme’s 
strategy: Merely encoding symmetries, only higher MTP 
moments can benefit from the absence of a number of 
components that are forbidden by the underlying geom¬ 
etry. For fixed training size, the MAE is roughly halved 
for quadrupoles when using the delta learning procedure, 
compared to the standard ML methodology. 

All results discussed so far refer to ML models of 
atomic MTPs in neutral molecules. Eor positively and 
negatively charged compounds we have found ML mod¬ 
els to yield very similar trends and accuracy (data not 
shown). 


IV. VALIDATION 

To assess the introduced MTP model’s applicability we 
have used predicted electrostatic coefficients to evaluate 
intermolecular interaction energies in molecular dimers 
and organic crystals. To do this, we accounted only 
for static MTP electrostatics and many-body dispersion 
(MBD) interactions, 

Avdw ~ i^MTP + Ambd, (12) 

neglecting induction, penetration, and repulsion terms. 
A short description of the MBD formalism is provided 
in the next paragraph. As discussed above, our MTP 
and MBD-models also represent approximations in the 
form of the A-ML model and dipole-dipole-manybody 
expansion, respectively. To better gauge the effect of the 
introduced MTP-ML model, we also compare to vdW en¬ 
ergy predictions using quantum-mechanically (QM) de¬ 
rived MTPs. 

Common approximations in the exchange-correlation 
potential used in density functional theory lead to inad¬ 
equate predictions of dispersion interactions. This has 
motivated the development of a number of dispersion- 
corrected methods. We hereby rely on the method devel¬ 
oped by Tkatchenko and coworkers [1^, in which free- 
atom polarizabilities are first scaled according to their 
close environment following a partitioning of the electron 
density. The many-body dispersion up to infinite order 
(in the dipole approximation) is then obtained by diago¬ 
nalizing the Hamiltonian of a system of coupled quantum 
harmonic oscillators, thereby coupling the scaled atomic 
polarizabilities at long range. The importance of many- 
body effects and accuracy of the method has been demon¬ 
strated on a large variety of systems [d^. Later, a clas¬ 
sical approximation relaxed the requirement for an elec¬ 
tron density, using instead a physics-based approach to 
estimate how atomic polarizabilities needed to be scaled 
based on a Voronoi partitioning [^ . 


A. Molecular dimers 

To gauge the accuracy of the electrostatics alone, 
we compare the electrostatic componenent of reference 
symmetry adapted perturbation theory (SAPT) results 
[45l l46l| to the corresponding intermolecular components 
derived either from QM MTPs or from the A-ML-MTP 
model. Eig. |3] displays the correlation plot between the 
two model MTP electrostatics calculations and SAPT for 
the S22 dimers [13 at different intermolecular distances 
[IM 1^ . 113 . The plot confirms that both MTP models 
generally underestimate the electrostatic SAPT compo¬ 
nent of the interaction energies, presumably due to a lack 
of penetration effects. Encouragingly, as the intermolec¬ 
ular distance is increased, the MTP predictions system¬ 
atically recede to the perfect correlation line. 

Not all errors are distributed uniformly across com- 
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# atoms 

training (iVt) prediction 

ML-85 

q 

A-ML-85 

MAE [eA^] 

ML-85 A-ML-85 

Qa/3 

ML-85 A-ML-85 

H 

28,822 

9,607 

0.01 

0.01 

0.03 

0.01 

0.04 

0.02 

C 

24,356 

4,297 

0.01 

0.01 

0.05 

0.05 

0.18 

0.09 

N 

4,054 

715 

0.02 

0.02 

0.09 

0.05 

0.26 

0.15 

0 

6,134 

1,082 

0.02 

0.02 

0.08 

0.04 

0.22 

0.12 

F 

363 

63 

0.03 

0.09 

0.04 

0.05 

0.14 

0.11 

S 

1,542 

272 

0.05 

0.05 

0.12 

0.09 

0.31 

0.20 

Cl 

739 

130 

0.03 

0.06 

0.11 

0.10 

0.28 

0.26 


66,010 

16,166 

0.02 

0.04 

0.08 

0.06 

0.20 

0.13 


TABLE II. MAE for each chemical element and MTP rank of the ML-85 and A-ML-85 transferable models (neutral molecules), 
corresponding to an 85% training-set size for all elements but H (only 75%). The last lines averages over all chemical elements. 
The second column denotes the number of molecules for which the MTP moments have been predicted. 



FIG. 3. Scatter plot between reference energy compo¬ 
nents from SAPT and computed electrostatics from either 
quantum-mechanical coefficients, T^mtpi the universal A- 
ML-85 MTP model, E^tpi for intermolecular dimers in the 
S22x5 dataset [i^ . Dimers closer than the equilibrium dis¬ 
tance (i.e., factor 0.9) were systematically excluded. Symbols 
correspond to different distance factors that multiply the equi¬ 
librium distance: 0.9 (plus sign), 1.0 (cross), 1.2 (star), 1.5 
(open square), and 2.0 (filled square). The straight line cor¬ 
responds to ideal correlation. 


pounds. Fig. |4] compares the MTP energy contributions 
-^MTP ^MTP’ reference SAPT data for 

each molecular dimer of the S22 dataset (i.e., at their 
equilibrium distance). We find good correlations between 
the QM and MTP models, though we note a number of 
qualitative discrepancies. In particular, the ML model 
fails to reproduce the attractive nature of the electro¬ 
static interaction for the water and ammonia dimers. 
Presumably, the model fails to predict their coefficients 


due to the molecules’ unique chemical composition: the 
ML model relies on interpolations across the trained 
molecules, of which some must be chemically similar to 
the new compound. Larger molecules are less problem¬ 
atic because similarities between chemical fragments oc¬ 
cur far more frequently. We do find systematic deviations 
between the MTP energies and the SAPT reference elec¬ 
trostatic data for strongly hydrogen-bonding compounds, 
for which penetration effects @ become significant. 

We have calculated molecular dimer energies cor¬ 
responding to various datasets for which high-level 
quantum-chemistry numbers have previously been pub¬ 
lished. We have considered the following databases: S22 
[13, S22x5 HIL S66 and S66x8 [i|, SCAI and X40 
and X40xl0 |5l|. All MTP coefficients have been pre¬ 
dicted using the A-ML-85 model (see Fig. [5] and Tab.llll). 
We only considered dimers made up of the chemical ele¬ 
ments H, C, O, N, S, F, and Cl, keeping 992 out of over 
1,300 vdW dimers. 

Fig. [5] contrasts the scatter correlation between ref¬ 
erence intermolecular energies, -Erei, and the sum of 
many-body dispersion and ML-predicted MTP electro¬ 
statics, -I- Ambd- The mean-absolute error of all 

intermolecular estimates using the MTPs from individ¬ 
ual quantum-chemistry calculations and the ML pre¬ 
dictions amount to 2.36 and 2.19 kcal/mol, respectively. 
In other words, the ML MTP prediction is on par with 
MTPs derived from explicit electron densities generated 
by computationally demanding quantum-chemistry cal¬ 
culations. Interactions between charged-charged amino 
acids of the SCAI database are reasonably well repro¬ 
duced (see insets of Fig. [S]), pointing to the robustness 
of the method not only for neutral compounds, but also 
charged species. 

B. Benzene crystal 

Increasingly accurate and fast methods provide the 
means for crystal structure prediction of organic com¬ 
pounds [ 53 , to the point of ranking polymorphs of molec- 









FIG. 4. Intermolecular static electrostatic energy contribution for each dimer of the S22 dataset, for coefficients parametrized 
from quantum-mechanical calculations, transferable ML model A-ML-85, F'mtp- Electrostatic energies from 

reference SAPT calculations are also provided [i^ . 


ular crystals [El [53. Moving toward a condensed-phase 
system, we have also evaluated the cohesive binding en¬ 
ergy predictions of a molecular benzene crystal. Follow¬ 
ing previous workjEllE^, we have computed the binding 
energy for different ratios of the unit-cell density with re¬ 
spect to the experimental density [E3 j p/Pexp- Isotropic 
density scalings were performed without further opti¬ 
mization, and the binding energy included interactions 
with the neighboring unit cells, as discussed in our previ¬ 
ous publication [ 2 I. Fig. [5] features the MTP and MBD 
based vdW estimates of cohesive energy as a function of 
density. Again, we compare the resulting numbers once 
using the A-ML-predicted MTP electrostatics, combined 
with MBD, and once using the QM MTP model com¬ 
bined with MBD. We find very good agreement between 
the two methods, validating here as well the ML model. 
Coincidentally, the two methods yield cohesive binding 
energies in good agreement with the experimental value. 
The lack of repulsive interactions prohibits a further in¬ 
crease in energy as p gets larger [^ . 


V. CONCLUSION 

We have introduced machine-learning models for elec¬ 
trostatic multipoles (MTPs) of H, C, O, N, S, F, and Cl 
atom types. The models have been trained on atomic 
multipole coefficients of small organic molecules evalu¬ 
ated using first principles calculations. Neutral, cationic, 
and anionic molecular states were treated with separate 
models. The model yields highly accurate MTPs for H, 
reasonable performance for C, N, O, and significant er¬ 
rors for S, F and Cl due to their sparsity in the training 
set. 

Focusing on the intermolecular S22 dimer dataset, 
MTP energies show good correlation between the coef¬ 
ficients parametrized from ML and individual quantum- 
chemistry calculations. A comparison with reference 
electrostatic interactions from symmetry-adapted pertur¬ 
bation theory (SAPT) is satisfactory for large intermolec¬ 
ular separations, and impaired by the lack of penetration 
effects at short distances. Furthermore, MTPs from the 
ML model have been combined with a classical many- 
body dispersion potential to estimate intermolecular en- 
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^MTP + -^mbd [kcal/mol] 

FIG. 5. Correlation plot between reference vdW ener¬ 
gies, Eysi, and energies predicted from the classical many- 
body dispersion method with static electrostatics, the latter 
parametrized from the A-ML-85 model, Smtp + ^-mbd- The 
inset corresponds to the two charged-charged side-chain inter¬ 
actions of the SCAI database. The present lack of repulsion, 
induction, and penetration effects is at cause for the outliers. 



FIG. 6. Cohesive binding energy of the benzene crystal as a 
function of the ratio of densities, p/pexp- Both models for 
intermolecular energies are presented: + F-mbd and 

Emtp + F-mbd. The experimental value is shown explicitly 
(black dot) [s^ . 


ergies of nearly 1,000 molecular dimers as well as the 
cohesive energy of the benzene crystal. The results show 
that the ML model retains overall a similar accuracy com¬ 
pared to calculations with the MTPs parametrized from 
individual quantum-chemical calculations. 

The A-ML approach, which augments a physics-based 
baseline model by a ML model, has proven to be use¬ 
ful to more efficiently train vector and tensor quantities. 
Incorporation of molecular symmetries via the Voronoi 
partitioning of the charge density, included in the base¬ 
line model, is at the heart of this improvement. 

The proposed models alleviate the need for 
quantum-chemistry calculations for every single 
molecule/molecular conformation in a perturbative 
evaluation of intermolecular interactions, bringing 
us one step forward toward the task of automated 
parametrizations of accurate state-independent and 
transferable force fields. 
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